Material spectroscopy

ABSTRACT

A computer, including a processor and a memory, the memory including instructions to be executed by the processor to acquire a first image by illuminating a first object with a first light beam, segment the first image of the first object to determine regions that correspond to a first surface material and determine a first measure of pixel values in regions of the first image that correspond to the first surface material. The instructions include further instructions to perform a comparison of the first measure of pixel values to a second measure of pixel values determined from a second image of a second object, wherein the second image is previously acquired by illuminating the second object with a second light beam and when the comparison determines that the first measure is equal to the second measure of pixel values within a tolerance, determine that the first object and the second object are a same object.

BACKGROUND

Vehicles can be equipped with computing devices, networks, sensors, andcontrollers to acquire and/or process data regarding the vehicle'senvironment and to operate the vehicle based on the data. Vehiclesensors can provide data concerning routes to be traveled and objects tobe avoided in the vehicle's environment. Operation of the vehicle canrely upon acquiring accurate and timely data regarding objects in avehicle's environment while the vehicle is being operated on a roadway.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of an example vehicle.

FIG. 2 is a diagram of example near infrared image histograms.

FIG. 3 is a diagram of example near infrared images and histograms.

FIG. 4 is a diagram of an example near infrared image with a human facedetected.

FIG. 5 is a diagram of an example near infrared image with an isolatedhuman face and a histogram.

FIG. 6 is a diagram of an example near infrared image of a fake humanface and a histogram.

FIG. 7 is a diagram of an example near infrared image of a human faceand an image of a segmented human face.

FIG. 8 is a diagram of an example near infrared image of a segmentedhuman face and a histogram.

FIG. 9 is a diagram of an example near infrared image of a segmentedfake human face and a histogram.

FIG. 10 is a diagram of an example masked near infrared image of a humanface and a histogram.

FIG. 11 is a diagram of an example masked near infrared image of a fakehuman face and a histogram.

FIG. 12 is a diagram of an example masked near infrared image of a humanface acquired at a first distance.

FIG. 13 is a diagram of an example masked near infrared image of a humanface acquired at a second distance.

FIG. 14 is a flowchart diagram of an example process to determine nearinfrared images of real and fake human faces.

DETAILED DESCRIPTION

A computing device in a traffic infrastructure system can be programmedto acquire data regarding the external environment of a vehicle and touse the data to operate the vehicle. For example, a camera in a vehiclecan be programmed to acquire an image of a human approaching the vehicleand, upon determining the identity of the human based on facialrecognition software, unlock the vehicle's doors to permit the operatorto enter the vehicle. Likewise, cameras included in the interior of thevehicle can acquire one or more images of a human and, upon determiningthe identity of the operator based on facial recognition software,accept commands from the human to operate the vehicle.

A computing device in a vehicle can be programmed to perform facialrecognition of a human by first acquiring a trained model duringenrollment, where an image of the human face to be identified isacquired. The computing device can then acquire a challenge image thatincludes a human face and process the challenge image to determinewhether the challenge image includes a human face that matches thetrained model. Facial recognition is a type of biometric authentication,where human body measurements are used to determine an identity of ahuman to perform access control. Biometric authentication can be used tocontrol access to buildings, homes, or vehicles, and can be used togrant permission to operate computers, phones, or other devices.Biometric authentication software can be executed on a computing deviceincluded in the location or device being accessed, or the image data canbe uploaded to a cloud-based server that maintains a database of trainedmodels for execution. The results of performing the biometricauthentication can be downloaded to the device seeking authenticationand permission to operate or access the location or device can begranted or denied.

Biometric facial recognition typically operates by calculatingphysiological characteristics of a human face and comparing thecalculated physiological characteristics to stored physiologicalcharacteristics from the trained model. Physiological characteristicscan include measures of facial features such as the distance betweenpupils, distance between corners of the mouth and length of nose, etc.These measures can be normalized by forming ratios of the measurementsand stored as the trained model. At challenge time, an image of thehuman seeking access is acquired and processed to extract physiologicalcharacteristics which are then compared to stored physiologicalcharacteristics to determine a match.

An issue with biometric facial recognition is “spoofing.” Spoofingoccurs when a non-authorized user seeks to gain access to a location ordevice using a fraudulent version of an authorized user's facialfeatures. Fraudulent versions of an authorized user's facial featurescan include color photographs, for example. Biometric facial recognitionsystems can use three-dimensional sensors such as laser range detectorsor lidars to prevent a non-authorized user from using a flat,two-dimensional photograph to spoof the system. Non-authorized usershave attempted to circumvent biometric facial recognition systems byusing three-dimensional (3D) masks that conform a user's general facialshape while including facial features belonging to an authorized user.These masks can range from inexpensive printed LYCRA® face masks tocustom-made silicon face masks used in motion pictures, for example.

Techniques discussed herein improve biometric facial recognition byusing spectral characteristics of human facial features to authenticateliveness in acquired image data. Liveness means that image datarepresents an actual (and not a spoofed) human face. Livenessauthentication means distinguishing between a live human face andfraudulent versions including 3D masks in acquired near infrared (NIR)images. These techniques illuminate the challenge human face with NIRillumination and acquire an image with an image sensor that includesred, green, blue and NIR sensing elements to form a red, green, blue(RGB)/NIR image by acquiring near infrared pixels, red pixels, and bluepixels. An RGB/NIR image is also referred to as a color image herein.The RGB/NIR or color image can be illuminated with both NIR light andwhite light or illuminated with NIR light and ambient light. The NIR andRGB response is analyzed to determine whether a face in the challengeimage belongs to a live human or a fraudulent reproduction. If it isdetermined that the face belongs to a live human, the challenge image ispassed onto biometric facial recognition system for further processing,otherwise access is denied. Technique discussed herein can compensatefor differences in ambient illumination, determine liveness based onsegmenting the challenge image, and compensate for differences indistance from the sensor. Techniques discussed herein can be used todetermine properties of materials in addition to human faces. Spectralproperties of near infrared images can be used to distinguish real fromcounterfeit goods by distinguishing real leather from imitation leather,for example. In another example, a manufacturing application coulddetermine that parts being installed in a product such as a vehicle aremade of the correct material. In other examples, photographs ofmaterials can be distinguished from near infrared images of the actualmaterials to verify goods for sale over the Internet.

FIG. 1 is a diagram of a vehicle 110 operable in autonomous(“autonomous” by itself in this disclosure means “fully autonomous”),semi-autonomous, and occupant piloted (also referred to asnon-autonomous) mode. One or more vehicle 110 computing devices 115 canreceive data regarding the operation of the vehicle 110 from sensors116. The computing device 115 may operate and/or monitor the vehicle 110in an autonomous mode, a semi-autonomous mode, or a non-autonomous mode,i.e., can control and/or monitor operation of the vehicle 110, includingcontrolling and/or monitoring components of the vehicle including asdescribed hereinbelow.

The computing device (or computer) 115 includes a processor and a memorysuch as are known. Further, the memory includes one or more forms ofcomputer-readable media, and stores instructions executable by theprocessor for performing various operations, including as disclosedherein. For example, the computing device 115 may include programming tooperate one or more of vehicle brakes, propulsion (e.g., control ofacceleration in the vehicle 110 by controlling one or more of aninternal combustion engine, electric motor, hybrid engine, etc.),steering, climate control, interior and/or exterior lights, etc., aswell as to determine whether and when the computing device 115, asopposed to a human operator, is to control such operations.

The computing device 115 may include or be communicatively coupled to,e.g., via a vehicle communications bus as described further below, morethan one computing devices, e.g., controllers or the like included inthe vehicle 110 for monitoring and/or controlling various vehiclecomponents, e.g., a powertrain controller 112, a brake controller 113, asteering controller 114, etc. The computing device 115 is generallyarranged for communications on a vehicle communication network, e.g.,including a bus in the vehicle 110 such as a controller area network(CAN) or the like; the vehicle 110 network can additionally oralternatively include wired or wireless communication mechanisms such asare known, e.g., Ethernet or other communication protocols.

Via the vehicle network, the computing device 115 may transmit messagesto various devices in the vehicle and/or receive messages from thevarious devices, e.g., controllers, actuators, sensors, etc., includingsensors 116. Alternatively, or additionally, in cases where thecomputing device 115 actually comprises multiple devices, the vehiclecommunication network may be used for communications between devicesrepresented as the computing device 115 in this disclosure. Further, asmentioned below, various controllers or sensing elements such as sensors116 may provide data to the computing device 115 via the vehiclecommunication network.

In addition, the computing device 115 may be configured forcommunicating through a vehicle-to-infrastructure (V-to-I) interface 111with a remote server computer, e.g., a cloud server, via a network,which, as described below, includes hardware, firmware, and softwarethat permits computing device 115 to communicate with a remote servercomputer via a network such as wireless Internet (WI-FI®) or cellularnetworks. V-to-I interface 111 may accordingly include processors,memory, transceivers, etc., configured to utilize various wired and/orwireless networking technologies, e.g., cellular, BLUETOOTH®, Ultra-WideBand (UWB),® and wired and/or wireless packet networks. Computing device115 may be configured for communicating with other vehicles 110 throughV-to-I interface 111 using vehicle-to-vehicle (V-to-V) networks, e.g.,according to Dedicated Short Range Communications (DSRC) and/or thelike, e.g., formed on an ad hoc basis among nearby vehicles 110 orformed through infrastructure-based networks. The computing device 115also includes nonvolatile memory such as is known. Computing device 115can log data by storing the data in nonvolatile memory for laterretrieval and transmittal via the vehicle communication network and avehicle to infrastructure (V-to-I) interface 111 to a server computer oruser mobile device.

As already mentioned, generally included in instructions stored in thememory and executable by the processor of the computing device 115 isprogramming for operating one or more vehicle 110 components, e.g.,braking, steering, propulsion, etc., without intervention of a humanoperator. Using data received in the computing device 115, e.g., thesensor data from the sensors 116, the server computer, etc., thecomputing device 115 may make various determinations and/or controlvarious vehicle 110 components and/or operations without a driver tooperate the vehicle 110. For example, the computing device 115 mayinclude programming to regulate vehicle 110 operational behaviors (i.e.,physical manifestations of vehicle 110 operation) such as speed,acceleration, deceleration, steering, etc., as well as tacticalbehaviors (i.e., control of operational behaviors typically in a mannerintended to achieve safe and efficient traversal of a route) such as adistance between vehicles and/or amount of time between vehicles,lane-change, minimum gap between vehicles, left-turn-across-pathminimum, time-to-arrival at a particular location and intersection(without signal) minimum time-to-arrival to cross the intersection.

The one or more controllers 112, 113, 114 for the vehicle 110 mayinclude known electronic control units (ECUs) or the like including, asnon-limiting examples, one or more powertrain controllers 112, one ormore brake controllers 113, and one or more steering controllers 114.Each of the controllers 112, 113, 114 may include respective processorsand memories and one or more actuators. The controllers 112, 113, 114may be programmed and connected to a vehicle 110 communications bus,such as a controller area network (CAN) bus or local interconnectnetwork (LIN) bus, to receive instructions from the computing device 115and control actuators based on the instructions.

Sensors 116 may include a variety of devices known to share data via thevehicle communications bus. For example, a radar fixed to a front bumper(not shown) of the vehicle 110 may provide a distance from the vehicle110 to a next vehicle in front of the vehicle 110, or a globalpositioning system (GPS) sensor disposed in the vehicle 110 may providegeographical coordinates of the vehicle 110. The distance(s) provided bythe radar and/or other sensors 116 and/or the geographical coordinatesprovided by the GPS sensor may be used by the computing device 115 tooperate the vehicle 110 autonomously or semi-autonomously, for example.

The vehicle 110 is generally a land-based vehicle 110 capable ofautonomous and/or semi-autonomous operation and having three or morewheels, e.g., a passenger car, light truck, etc. The vehicle 110includes one or more sensors 116, the V-to-I interface 111, thecomputing device 115 and one or more controllers 112, 113, 114. Thesensors 116 may collect data related to the vehicle 110 and theenvironment in which the vehicle 110 is operating. By way of example,and not limitation, sensors 116 may include, e.g., altimeters, cameras,lidar, radar, ultrasonic sensors, infrared sensors, pressure sensors,accelerometers, gyroscopes, temperature sensors, pressure sensors, hallsensors, optical sensors, voltage sensors, current sensors, mechanicalsensors such as switches, etc. The sensors 116 may be used to sense theenvironment in which the vehicle 110 is operating, e.g., sensors 116 candetect phenomena such as weather conditions (precipitation, externalambient temperature, etc.), the grade of a road, the location of a road(e.g., using road edges, lane markings, etc.), or locations of targetobjects such as neighboring vehicles 110. The sensors 116 may further beused to collect data including dynamic vehicle 110 data related tooperations of the vehicle 110 such as velocity, yaw rate, steeringangle, engine speed, brake pressure, oil pressure, the power levelapplied to controllers 112, 113, 114 in the vehicle 110, connectivitybetween components, and accurate and timely performance of components ofthe vehicle 110.

Vehicles can be equipped to operate in both autonomous and occupantpiloted mode. By a semi- or fully-autonomous mode, we mean a mode ofoperation wherein a vehicle can be piloted partly or entirely by acomputing device as part of a system having sensors and controllers. Thevehicle can be occupied or unoccupied, but in either case the vehiclecan be partly or completely piloted without assistance of an occupant.For purposes of this disclosure, an autonomous mode is defined as one inwhich each of vehicle propulsion (e.g., via a powertrain including aninternal combustion engine and/or electric motor), braking, and steeringare controlled by one or more vehicle computers; in a semi-autonomousmode the vehicle computer(s) control(s) one or more of vehiclepropulsion, braking, and steering. In a non-autonomous mode, none ofthese are controlled by a computer.

FIG. 2 is a diagram of three histograms 200, 204, 208 of image pixelintensity. Histograms discussed herein, including the histograms 200,204, 208 display, for respective images, a measure of pixel values ofvarious intensities in the image, e.g., the histograms 200, 204, 208 areformed or generated by counting the number of pixels at each pixelintensity in an image and plotting the counted number of pixels on theY-axis (COUNT) against the pixel intensities on the X-axis (INTENSITY).The images from which the histograms 200, 204, 208 are determined areacquired by illuminating a scene with a near infrared (NIR) light. NIRlight has a wavelength of between 800 and 2,500 nanometers (nm). In thisexample the NIR light can have a wavelength of about 850 nm or 940 nm.The NIR light can be acquired with a camera that includes a solid-statesensor that is sensitive to NIR light. Solid-state sensors manufacturedusing CMOS technology are naturally sensitive to NIR light and typicallyrequire an infrared blocking optical filter if NIR light is unwanted.Sensors are available that includes RGB filtered photo sites in additionto unfiltered photo sites in a mosaic arrangement to produce image thatinclude RGB and NIR pixels. Still image cameras and video cameras caninclude RGB-NIR filters to produce RGB-NIR images. The resulting RGB-NIRimages can be displayed to produce a sum of visible (RGB) and NIR pixelsor the NIR pixels can be extracted to form an NIR image.

Techniques discussed herein include illuminating a scene with an NIRlight and acquiring an image of the illuminated scene with CMOS sensorconfigured to acquire NIR light. Techniques discussed herein will alsowork with other types of illumination and other types of sensors. Forexample, the scene can be illuminated with one or more wavelengths ofvisible light and an image acquired using an unmodified RGB imagesensor. Any wavelength of short wave infrared (SWIR) light can be usedwith the techniques discussed herein. SWIR light refers to infraredlight that is reflected by objects as opposed to long wavelengthinfrared, which can be emitted by objects. The infrared wavelengthsdiscussed above are employed because they can be emitted, focused andacquired using relatively inexpensive lights, lenses and sensors andtend to have less competing ambient illumination.

Histograms 200, 204, 208 were generated from an image of a live humanface, a picture of a human face, and a modified picture of a human face,respectively. Analysis of the distribution of pixel counts 202, 206, 210in each of the histograms 200, 204, 208 can distinguish between a livehuman face, a picture of a human face, and a modified picture of a humanface. Analysis of the distribution of pixels counts 202, 206, 210 can beperformed by assuming that the distributions are Gaussian, and fitting aGaussian distribution to the distributions of pixel counts 202, 206,210. A Gaussian distribution G is described in terms of its mean valuem, standard deviation a and height a by the formula:

$\begin{matrix}{G = {{f(x)} = {a \cdot {\exp\left( {- \frac{\left( {x - m} \right)^{2}}{2\sigma^{2}}} \right)}}}} & (1)\end{matrix}$

Fitting a Gaussian curve determines the parameters of m, a, and a thatminimize a sum of squared differences between the Gaussian curve and thedistribution of pixel counts 202, 204, 206.

Additional parameters that can be determined based on a Gaussian curveare skewness and kurtosis. Skewness is a parameter that measures thesymmetry of count data with respect to the mean m. Skewness compares themass of count data included in the Gaussian curve on either side of themean m. Skewness can be measured by determining the third standardizedmoment μ ₃ about the mean m as determined by the equation:

$\begin{matrix}{{\overset{¯}{\mu}}_{3} = {E\left\lbrack \left( \frac{G - m}{\sigma} \right)^{3} \right\rbrack}} & (2)\end{matrix}$

Where E is the expectation operator, G is the Gaussian distribution, mis the mean and σ is the standard deviation as above. Kurtosis is aparameter that measures the “tailedness” of a Gaussian distribution,where tailedness is a measure of the amount of data in the tails orextremes of a Gaussian distribution compared to the central portionaround the mean m. Kurtosis can be measured by determining the fourthstandardized moment μ ₄ about the mean m according to the equation:

$\begin{matrix}{{\overset{¯}{\mu}}_{4} = {E\left\lbrack \left( \frac{G - m}{\sigma} \right)^{4} \right\rbrack}} & (3)\end{matrix}$

Where E is the expectation operator, G is the Gaussian distribution, mis the mean and σ is the standard deviation as above. Gaussianparameters including skewness μ ₃ and kurtosis μ ₄ in addition to meanm, standard deviation a and height a can be determined and used tocharacterize Gaussian curves.

Examination of Gaussian curves corresponding to the distributions ofpixel counts 202, 206 corresponding to a live human face (pixel count202) and a picture of the same human face (pixel counts 206) yields aquantifiable distinction between the standard deviations of thedistributions. The distribution of pixel counts 206 corresponding to thepicture of the human face in this example has a standard deviation thatis greater than twice the standard deviation of the distribution ofpixel counts 202 corresponding to the live human face. In histogram 208the intensity of light illuminating a picture of a human face has beenreduced to reduce the standard deviation of the distribution of pixelcounts 210 to be similar to the standard deviation of the distributionof pixel counts 202 occurring in the histogram 200 corresponding to thelive human face. Reducing the illumination in this fashion causes themean of the distribution of pixel counts 210 corresponding to the dimmedpicture of the human face to have a mean that is about half of the meanof the distribution of pixel counts 202 corresponding to the live humanface.

Gaussian parameters m, σ, a, μ ₃ and μ ₄ for a live human face can bedetermined by first acquiring a sample image of the live human face byilluminating the live human face with NIR light and acquiring an RGB-NIRimage. A histogram can be formed from the NIR pixels of the RGB-NIRimage and values of Gaussian parameters can be determined based on theacquired histogram. Forming a histogram from pixels in an image isreferred to as enrollment, and determining the values of Gaussianparameters is referred to as training a model. At a later time, when ahuman seeks access to the vehicle or device, a challenge image isacquired by illuminating the human with an NIR light and an NIRhistogram is formed. Values of Gaussian parameters are determined by acomputing device and compared to the trained model. If the values of theGaussian parameters obtained from the challenge image are within atolerance value of the values in the trained model, the challenge imageis accepted, and the acquired image is transmitted to a computing devicefor further processing. Further processing can include facialrecognition, for example. Tolerance values can be determined byempirical studies of histograms acquired from a plurality of live humanfaces and pictures of human faces. For example, values of m, σ, and acan be required to be within 50% of the values of m, σ, and a in thetrained model for acceptance.

Another technique for authentication of human faces is textureprocessing on the acquired NIR image. Texture is a measure of thevariation in pixel values of small regions of an image. Textureprocessing can distinguish between portions of an acquired NIR image ofa human face and acquired NIR images of a photograph or mask. Thevariation in pixel values caused by variation in the 3D structure ofsmall regions of a human face yield far different texture measures thanthe smoother variation of corresponding regions of a photograph or aphotographically produced mask. Examples of texture processingtechniques include Gabor filters and local binary patterns. Gaborfilters are 2D convolution kernels formed by multiplying 2D Gaussianfunctions with sinusoidal functions. Local binary patterns compare thepixel values of eight nearest neighbors with the pixel value of thecentral pixel and populate a binary word with 1s or 0s depending uponwhether the neighboring pixel is greater than the central pixel. Both ofthese texture processing techniques can yield an output image that canbe further processed to distinguish between a human face and aphotographic simulation. The output of a texture processing process canalso be processed using Gaussian parameters as discussed above.

A spectroscopic material identification system as described herein canacquire NIR image data and train models for materials including cotton,polyester blends, latex, nylon and papers in addition to live humanskin. Acquiring trained models for these types of materials can assist alive human recognition system in separating live human skin frommaterials that can be used to prepare masks that can be used to spooffacial recognition systems. For example, a photograph of a person can beprinted on fabric which can be worn as a mask that conforms to aperson's facial features. The combination of a high-resolution printedimage of a human face with 3D facial contours can spoof a facialrecognition system that relies on a 3D sensor to detect differencesbetween a flat photograph and a human face. Techniques described hereinimprove the ability to distinguish between live human skin and aphotographic likeness by acquiring data regarding the spectral responseof human skin versus other materials. Likewise, techniques describedherein improve the ability to distinguish between live human skin andsilicon-based masks that can spoof systems that rely on 3D sensors todistinguish between 2D representations and live human faces.

Techniques described herein can also distinguish between live humanfaces and photographic likenesses despite objects such as facialpiercings, eyeglasses, or temporary tattoos with metallic based ink.Objects like facial piercings, eyeglasses, or some tattoos can havedifferent spectral reflectance compared to face or materials of interestlike leather or nylon. For example, eyeglasses can reflect infraredlight differently depending on the presence of polarization layers inthe glass. The presence of anomalies such as piercings and eyeglassescan be detected by using techniques described herein. Informationregarding the anomalies' size and shape can be extracted by processingRGB and NIR images of the subject using machine vision techniques. Alibrary of machine vision techniques for object recognition is includedin Dlib, a toolkit containing machine learning algorithms and tools forcreating complex software in C++. Dlib is available at Github.com and isavailable on an open source license which permits its use free ofcharge. The location, size and shape of the anomalies can be subtractedfrom the image data prior to determining the histogram. Informationregarding the anomalies can be added to a trained model database duringenrollment and be used as additional data for identification and spoofrejection. Determining anomalies in NIR images is discussed in relationto FIGS. 7-11, below.

Performing robust materials spectroscopy as discussed herein can requirecreating a database of sufficient size to hold all or at least ameaningful set of expected materials and fakes. A large database ofmaterials may result in large search times, which can be undesirable fora system designed to run in real time such as a facial recognitionsystem. Run time optimization may be performed by placing bounds on thematerials search space. Run time optimization can be performed based onthe calculated material reflectance, where, for example, the calculatedmaterial reflectance would only be compared to the nearest materialneighbors. Run time optimization can also be performed based on context.Context can include expected type of materials and their associatedfrequency based off historical use, location and type of activity; thematerials would then be considered in order of likelihood.

FIG. 3 is a diagram of two NIR images 300, 308 and two NIR histograms302, 310 generated from the images 300, 308, respectively. The first NIRimage 300 is acquired with low ambient light and the second NIR image308 is acquired with average ambient light. Ambient light is defined asillumination in a scene that comes from light sources other than the NIRlight. For example, room lights, vehicle interior lights or sunlight canbe sources of ambient light. Different sources of ambient light caninclude different amounts of NIR illumination. For example, naturalsunlight and incandescent light include substantial amounts of NIRillumination while fluorescent and LED light include almost no NIRillumination. Changes in ambient light can cause changes in a histogramgenerated from an NIR image acquired with the ambient light. This isillustrated by histograms 302, 310.

Histograms 302, 310 include Gaussian distributions 304, 306, 312, 314that were determined by fitting Gaussian distributions to the raw dataand rendered in the histograms in place of the raw data. In histograms302, 310 the frequency scale (FREQUENCY) on the Y-axes corresponds torelative values of the a parameter for each Gaussian distribution 304,306, 312, 314 rather than raw count data. In histogram 302 Gaussiandistribution 304 corresponds to NIR pixel data for the human face in theNIR image 300 and Gaussian distribution 306 corresponds to NIR pixeldata for the background ambient light. In histogram 310, Gaussiandistribution 312 corresponds to NIR pixel data for the human face in theNIR image 308 and Gaussian distribution 314 place of the raw data. Inhistograms 302, 310 the count scale on the Y-axes corresponds torelative values of the a parameter for each Gaussian distribution 304,306, 312, 314 rather than raw count data (as is displayed in thehistograms 200, 204, 208 discussed above). In histogram 302 Gaussiandistribution 304 corresponds to NIR pixel data for the human face in theNIR image 300 and Gaussian distribution 306 corresponds to NIR pixeldata for the background ambient light. In histogram 310, Gaussiandistribution 312 corresponds to NIR pixel data for the human face in theNIR image 308 and Gaussian distribution 314 corresponds to NIR pixeldata for the background ambient light. As can be seen from histograms302, 310, changes in ambient light in NIR images 300, 308 have changedthe values of Gaussian parameters m, σ, a, μ ₃ and μ ₄ from histogram302 to different values of Gaussian parameters m, σ, a, μ ₃ and μ ₄ inhistogram 310. For example, the value of m in histogram 302 has changedfrom about 62 to about 100 in histogram 310, the value of σ has changedfrom about 12 in histogram 302 to about 25 in histogram 310 and thevalue of a has changed from about 0.08 in histogram 302 to about 0.06 inhistogram 310.

Techniques discussed herein can compensate for ambient NIR illuminationby fitting a Gaussian distribution to raw data corresponding to ambientNIR illumination. The shifts in Gaussian distributions are illustratedin histograms 302, 310 by Gaussian distributions 306, 314. The shifts invalues of Gaussian parameters m, σ, a, μ ₃ and μ ₄ for Gaussiandistributions for NIR pixel data for human faces based on values ofGaussian parameters m, σ, a, μ ₃ and μ ₄ for ambient NIR illuminationcan be determined empirically. A plurality of NIR images of human facescan be acquired in a plurality of ambient NIR conditions ranging from noNIR ambient illumination to high NIR ambient illumination. Therelationships between shifts in Gaussian parameters m, σ, a, μ ₃ and μ ₄for Gaussian distributions for NIR pixel data for human faces based onvalues of Gaussian parameters m, σ, a, μ ₃ and μ ₄ for ambient NIRillumination can be determined by linear regression between eachvariable separately. Linear regression can determine a linearrelationship between the shift in values of Gaussian parameters m, σ, a,μ ₃ and μ ₄ for Gaussian distributions for NIR pixel data for humanfaces and measured values of Gaussian parameters m, σ, a, μ ₃ and μ ₄for ambient NIR illumination. These linear relationships can be used tocompensate for ambient NIR illumination.

In examples where determining a Gaussian distribution for ambientillumination is made difficult by greater variance in backgroundillumination, ambient NIR illumination can be estimated by toggling theNIR illumination used to acquire the NIR image on to acquire a first NIRimage, and then off to acquire a second NIR image. The second NIR imagewill thus include only ambient NIR illumination and can therefore bemore readily analyzed using the methods discussed above to determine theeffect of ambient NIR illumination. Values of Gaussian parameters m, σ,a, μ ₃ and μ ₄ for Gaussian distributions for ambient NIR illuminationcan be determined using this method and applied to the values ofGaussian parameters m, σ, a, μ ₃ and μ ₄ for Gaussian distributions forNIR pixels corresponding to the human face using the linearrelationships determined above. This technique would require control ofand synchronization with the NIR illuminator and would requireacquisition of two frames of NIR image data, thereby increasing systemcost and processing time. In authentication systems that use RGB data inaddition to NIR data, the RGB image data can be used to determineambient illumination in similar fashion to the techniques discussed forNIR images. Toggling a single RGB/NIR image will provide data that canbe used to determine Gaussian parameters for red, green, and bluechannels in addition to NIR channels.

FIG. 4 is a diagram of an NIR image 400 that illustrates a secondtechnique for estimating ambient NIR illumination. In NIR image 400 theportion of the NIR image 400 occupied by the human face indicated by anellipse 402. This ellipse 402 can be generated by processing the NIRimage 400 with facial recognition software available in Dlib, forexample, as discussed above in relation to FIG. 2. Facial recognitionsoftware can determine an ellipse 402 that encloses the portion of NIRimage 400 occupied with a human face. A histogram can be formed usingpixels within the ellipse 402 to determine values of Gaussian parametersm, σ, a, μ ₃ and μ ₄ for Gaussian distributions for NIR pixel data forthe human face within the ellipse 402 and a histogram for pixels outsidethe ellipse 402 can be used to determine values of Gaussian parametersm, σ, a, μ ₃ and μ ₄ for Gaussian distributions for NIR pixel datacorresponding to ambient NIR illumination. Once the values of Gaussianparameters m, σ, a, μ ₃ and μ ₄ for both Gaussian distributions aredetermined, then the values of Gaussian parameters m, σ, a, μ ₃ and μ ₄for Gaussian distributions corresponding to the human face can beadjusted using the linear relationships determined above. In exampleswhere the image data includes more than one human face, each human facecan be detected, and an ellipse can be generated for each face. Theaverage light intensity can be determined for each face and Gaussianparameters for pixels corresponding to each face can be determined.Determining background Gaussian parameters for background pixels can bemore accurately determined based on comparing the background to Gaussianparameters for each face in the field of view of the camera.

Calculation of background pixel intensities can be initiated in anauthentication system based on determining an overall average pixelintensity for an acquired NIR or RGB/NIR image and comparing it to apreviously acquired value. If the average pixel intensity of theacquired image differs by more than a user determined threshold valuefrom a previously acquired image, the system can re-calculate thebackground value based on the currently acquired image. In otherexamples, the variance of the pixel values can be determined for acurrently acquired image and compared to a variance determined based ona previously acquired image. If the variance of pixel values in thecurrent image differs by more than a user-determined threshold amountfrom a previously determined variance value, new background pixelparameters can be determined as described above. In yet other examples,an elapsed time clock can be started when background pixel values aredetermined and the background pixel values can be re-determined when auser-determined time period, for example 10 seconds has elapsed sincethe last background was determined.

Another technique for determining Gaussian parameters for pixels outsideof the ellipse 402 is to divide portions of the NIR image 400 outside ofthe ellipse 402 into segments with uniform size and shape. Backgroundportions of NIR image 400 are the portions of NIR image 400 outside ofellipse 402. Background portions of NIR image 400 can include objectsthat reflect NIR illumination and interfere with calculation of Gaussianparameters that correspond to background illumination. For example, ahat, a scarf, or a subject's hand can be included in the backgroundportions of NIR image 400. The background portion of NIR image 400 canbe divided into segments with uniform size and shape by combining auser-determined pattern of regions with the ellipse 402 corresponding toa subject's face. The background segments can be contiguous ornon-contiguous. A Gaussian parameters of pixel values can be determinedfor each segment. An overall mean and standard deviation for pixelvalues can be determined, i.e., for all segments, and segments with astandard deviation that is less than or equal to the overall standarddeviation can be retained for generation of a histogram anddetermination of Gaussian parameters m, σ, a, μ ₃ and μ ₄ to correspondto the background illumination. Regions with standard deviations greaterthan the overall standard deviation are eliminated from the backgroundhistogram generation. The overall standard deviation can include auser-determined tolerance value based on the amount of noise in thepixel values. Determining the background histogram in this fashionreduces the effect of objects in the background portion of NIR image 400from being included in the Gaussian parameter determination forbackground portions of NIR image 400.

Another technique for determining ambient NIR illumination is based oncombining data from the NIR channel with data from one or more of theRGB channels. The Gaussian distributions corresponding to a human facein a histogram can be normalized based on data from the blue channel,for example. Normalization can be performed by dividing the value ineach NIR pixel by corresponding pixel values in the blue channel.Dividing by the value of pixels in the blue channel normalizes the NIRdata because the data in the blue channel corresponds to ambientillumination without NIR illumination. Dividing the NIR pixel values bycorresponding pixel values from the blue channel can approximate theeffect of toggling the NIR light off to acquire an ambient illuminationimage without requiring the time and expense of controlling the NIRlight.

Another technique for combining NIR and RGB data is channel fusion.Channel fusion is when data from two or more channels are combined toform a multidimensional space. For example, data from the NIR channeland data from the blue channel can be combined using orthogonal axescorresponding to the NIR data and the blue channel data that form atwo-dimensional space. Each pixel will be located in the two-dimensionalspace according to its NIR channel and blue channel values and theresulting graph will indicate the pixel count corresponding to the NIRchannel and blue channel values. Two-dimensional Gaussian curve fittingcan be performed on the two-dimensional space that includes thetwo-dimensional count data to determine two-dimensional Gaussianparameters that can be processed in similar fashion to one-dimensionalGaussian parameters. Additional channel data can be combined by addingan additional orthogonal axis for each channel added thereby forminghigher dimensional spaces. Higher dimensional Gaussian parameters can bedetermined for the higher dimensional spaces and processed in similarfashion to one-dimensional Gaussian parameters as discussed below inrelation to FIG. 5.

FIG. 5 is a diagram of an NIR image 500 of a human face illuminated byNIR light and a histogram 502 corresponding to the NIR pixels of the NIRimage 500. The histogram 502 includes count data 504 corresponding tothe frequency or count data plotted on a graph with the Y-axiscorresponding to the number of pixels at each pixel value (representedby the axis labeled COUNT) and the X-axis (labeled INTENSITY)corresponding to the range of pixel values, i.e., intensities, in theNIR image 500. The dotted lines 506 enclose or bound pixel values thatcorrespond to intensity values for human skin occurring in NIR image500. The remainder of the count data 504 corresponds to non-skinportions of the human face in NIR image 500, for example facial hairincluding beard and eyebrows, lips, eyes and eyeglasses.

Each portion of the human face included in NIR image 500 corresponds toa Gaussian distribution of pixel values. In histogram 502 a plurality ofGaussian distributions corresponding to a plurality of portions of NIRimage 500 are added together to form the count data 504. An issue withprocessing count data 504 to determine Gaussian distributions for aplurality of portions of an NIR image 500 is determining separate theGaussian distributions for each of the portions. One technique ofseparating the Gaussian distributions is to assume a Gaussian mixturemodel for the count data 504. A Gaussian mixture is a probabilisticmodel for representing subpopulations within an overall population. Inthis example, count data 504 is modeled as a mixture of a plurality ofcomponents, where each component is a Gaussian distribution.

The Gaussian mixture corresponding to the count data 504 can be aprobability distribution p_(n)(x) for the n^(th) iteration which isequal to a function of K Gaussian distributions (components) determinedby the equation:

p _(n)(x)=Σ_(i=1) ^(N){tilde over (ϕ)}_(i)

({tilde over (μ)}_(i),{tilde over (Σ)}_(i))  (4)

Where {tilde over (ϕ)}_(i) is an estimate of the mixture weight, whichis the prior probability corresponding to a component i and

is a Gaussian (normal) distribution function for each componentdescribed by an estimated mean {tilde over (μ)}_(i), and an estimatedcovariance matrix {tilde over (Σ)}_(i) that describes the distributionof each component in the presence of each other component.

One technique for determining the distributions of each component of aGaussian mixture is Expectation Maximization (EM). Given an initialestimate of i, the number of components, the EM algorithm iterates onequation (2), adjusting component weights {tilde over (ϕ)}_(i),calculating a new distribution p_(n)(x) at each step and determining aconditional probability for the new distribution based on the values ofGaussian parameters m, σ, a, μ ₃ and μ ₄ determined for each of theGaussian distributions in the population. Each iteration of the EMalgorithm updates the values of m, σ, a, μ₃ and μ ₄ and the mixtureweights {tilde over (ϕ)}_(i) to increase the conditional probabilitythat the calculated distribution p_(n)(x) is equal to the inputdistribution p(x). Iterating using the EM technique will converge on asolution corresponding to a summed square difference less than athreshold in a finite number of steps. Problems with convergence tolocal maxima and sensitivity to the starting point can be addressed bydetermining a sample grid based on the probable solution space and anappropriate threshold can be determined based on empirical testing.Using a Gaussian mixture model is one technique to isolate pixel data ofinterest to perform authentication using material spectroscopy asdescribed herein. Other techniques include other types of filtersapplied to both the histogram data and input image data, higher orderstatistical processing applied to the histogram data or deep neuralnetwork processing as will be described below in relation to FIG. 7.

FIG. 6 is a diagram of an NIR image 600 of a picture of a human face anda histogram 602 formed by count data 604 representing pixels of the NIRimage 600. The histogram 602 includes count data 604 corresponding tothe frequency or count data plotted on a graph with the Y-axiscorresponding to the number of pixels at each pixel value (COUNT) andthe X-axis corresponding to the range of pixel values (INTENSITY) in theNIR image 600. The dotted lines 606 enclose (i.e., bound) the pixelvalues that correspond to intensity values for “skin” occurring in NIRimage 600. The remainder of the count data 604 corresponds to non-“skin”portions of the picture of the human face in NIR image 600, for examplerepresentations of facial hair, eyes, and eyeglasses. The distributionof count data 604 for the picture of the human face differs from thedistribution of count data 504 from the live human face. Separating thedistributions corresponding to the real human skin and the photographicrepresentation of human skin from the count data 504, 604 using the EMalgorithm based on a Gaussian mixture model can distinguish between animage of a real human and an image of a photograph of a human byextracting Gaussian distributions that can be analyzed by the techniquesdescribed above in relation to FIGS. 2-4, above.

FIG. 7 is a diagram of an NIR image 700 and a segmented NIR image 702that illustrates another technique for separating Gaussian distributionsbelonging to separate components or portions of an NIR image 700. Inthis example, the NIR image 700 is segmented prior to forming histogramsand separate histograms are generated corresponding to each portion orcomponent of the NIR image 700. The image of the human face in NIR image700 can be processed using a trained deep neural network to generate asegmented image 702 or by using the Dlib image processing library asdiscussed above in relation to FIG. 2 to determine facial landmarkswhich can be processed to form segmented image 702. For example, aninput NIR image 700 can be processed using the Dlib image processinglibrary to generate facial landmarks. Facial landmarks are locations onan image of a human face that can be repeatably determined on images ofhuman faces, where repeatably determined means that the same landmarkswill be determined on a plurality of different input images. Forexample, the Dlib image processing library can locate facial landmarkscorresponding to the inside and outside corners of each eye visible inan image of a human face, along with facial landmarks corresponding tothe upper and lower lids of each eye. Fitting an ellipse to the faciallandmarks corresponding to each eye will provide an area that segmentsthe eye portions of an image of a human face. Likewise, facial landmarkscorresponding to other portions of a human face such as libs and facialhair can be joined by lines that generate regions that can be used tosegment an NIR image 700 of a human face to generate a segmented image702.

Another technique for processing an input NIR image 700 to determineimage segments corresponding to human skin, eyes, facial hair, etc. isto train a deep neural network to process NIR images 700 that includehuman faces. A deep neural network can include convolutional layers andfully-connected layers that process input NIR images 700 and output animage with labeled regions corresponding to portions of a human face andbackground portions as illustrated in segmented image 702. Aconvolutional neural network can be trained by generating a trainingdataset by labeling a large number (>1000) of NIR images 700 of humanfaces to form ground truth images for training the convolutional neuralnetwork. An NIR image 700 can be manually labeled by a human operatorusing graphics software that permits the human operator to overlaylabeled regions on an NIR image 700 of a human face. Labeled NIR images702 can also be generated by inputting the NIR images 700 into the Dlibsoftware as discussed above to generate facial landmarks which can beprocessed either manually or using machine vision software to generateregions corresponding to facial features to be segmented as discussedabove. In either case, at training time, NIR images 700 are input to theconvolutional neural network and a loss function is determined based oncomparing the output from the convolutional neural network to the groundtruth segmented NIR images 702 and the resulting loss value is used toselect weights for the convolutional neural network to minimize the lossfunction. In operation a NIR image 700 that includes a human face ininput to the trained convolutional neural network and a segmented NIRimage 702 that includes labels is output.

A segmented image 702 generated by machine vision software or by a deepneural network assigns color or greyscale values based on a small number(<10) of possible different types of facial features. The facialfeatures include skin 704, background (non-facial) 706, eyebrows 708,710, eyes 712, 716, facial hair (beard) 714 and lips 718. Other portionsof the input NIR image 700, such as eyeglasses, can be ignored by thesegmentation process as being non-essential and not likely to impactcalculation of the histograms. Image segmentation is a “denoising”technique that provides histogram data corresponding to skin portions ofa human face without including non-skin portions such as eyes or facialhair. Additional processing can be required to isolate skin portionsfrom non-skin portions. Filters, higher order statistics or furtherprocessing with neural networks can further isolate pixels correspondingto human skin. For example, machine vision software or deep neuralnetworks can also generate segmented NIR images that isolate the uppercheek and nose regions similar to mask portions 1008, 1108, 1208, 1308of NIR images 1000, 1100, 1200, 1300, respectively. Using segmentationtechniques to mask input NIR images in this fashion can reliably providepixel data corresponding to skin portions of a human face because theupper cheek and nose regions of the human face are rarely obscured byfacial hair or otherwise covered by clothing.

In addition to processing images of human faces, image segmentation canbe used to segment images of consumer goods or industrial parts tolocate regions that can be processed to determine genuine items fromfake items. For example, an article can be processed to isolate aportion of the article corresponding to leather. Techniques describedherein can be used to differentiate between genuine leather andimitation leather based on spectroscopic response as discussed above inrelation to distinguishing skin from a photograph of skin. Segmentationtechniques can also be evaluated based on the success or failure of theoverall authentication process as discussed above in relation to FIG. 5.Segmentation images corresponding to true negatives (correctlyidentifying an attempt to spoof the system) and false negatives (denyingaccess to a valid user) can be stored and used to retrain the systemboth locally on the computing device 115 in a vehicle 110 and uploadedto a cloud-based server computer to be used to retrain theauthentication system to be shared with other vehicles in a federatedsystem. As discussed above, in a federated system, each vehicle sharesresults that can be used to retrain the entire system which can then beshared with all of the vehicles.

In addition to segmenting NIR images, a deep neural network can betrained to process NIR images directly to determine whether the NIRimage included a real human face or a fake human face. A deep neuralnetwork can be used to process NIR images directly, without extractingone to multi-dimensional Gaussian parameters from histogram data. A deepneural network includes a plurality of convolutional and fully-connectedlayers that process input data using weights to determine the processingperformed at each layer. The weights for each layer can be determined bytraining the deep neural network using a large number, which can begreater than 1000, of training images. Each training NIR image includesground truth corresponding to the NIR image, where ground truth is the“correct answer” corresponding to the image data determined by a meansindependent from the deep neural network. In this example, the correctanswer can be determined by a human observing the image and labeling theimage data as “human” or “fake”, for example. The deep neural networkprocesses each input image a plurality of times to attempt to classifythe input image as “human” or “fake”. Output from the deep neuralnetwork is compared to the ground truth to determine a loss function,which is backpropagated to the deep neural network to determine whichweights achieve a correct answer, i.e. low loss value. The weights whichachieve low loss for the most input images are retained and form theparameters used in the trained deep neural network. The trained deepneural network can then be used to process challenge images to determinewhether the input image is “human” or “fake”.

FIG. 8 is a diagram of a segmented NIR image 800 of a photograph of ahuman face and a histogram 802 determined based only on the “skin”portion of the segmented NIR image 800. Segmented NIR image 800 is basedon NIR image 600 from FIG. 6 and is determined using a deep neuralnetwork or machine vision software as discussed above in relation toFIG. 7. Segments included in segmented image 800 can be used as a maskto guide generation of histogram 802 from the original NIR image 600used to form the segmented NIR image 800. In histogram 802, only thepixels of the NIR image 600 used to form segmented NIR image 800included in the “skin” portion of segmented NIR image 800 are used tocalculate the count data 804. As can be seen from histogram 802 thecount data 804 is concentrated in the “skin” 806 portion of histogram802 enclosed by the dotted lines. Comparing histogram 802 to histogram602 which was determined based on the same NIR image 600 withoutsegmentation shows that histogram 802 includes far less data fromnon-skin portions of the NIR image 600. Use of a segmented NIR image 800as a mask permits calculation of Gaussian parameters without requiringapplication of a Gaussian mixture model or improving the performance ofa Gaussian mixture model applied to the count data 804 to extract theGaussian distribution that includes only the skin portion of the NIRimage 600.

FIG. 9 is a diagram of a segmented NIR image 900 and a histogram 902determined based on the “skin” portion of the segmented NIR image 900.Segmented NIR image 900 is based on the NIR image 700 of a human faceand is determined using a deep neural network or machine vision softwareas discussed above in relation to FIG. 7. Segments included in segmentedimage 900 can be used as a mask to guide generation of histogram 902from the original NIR image 700 used to form the segmented NIR image 900similarly to FIG. 8. In histogram 902, only the pixels of the NIR image700 used to form segmented NIR image 900 included in the “skin” portionof segmented NIR image 900 are used to calculate the count data 904. Ascan be seen from histogram 902 the count data 904 is concentrated in the“skin” 906 portion of histogram 902 denoted by the dotted lines.Comparing histogram 902 to histogram 802, it can be seen thatcalculation of Gaussian parameters for the two distributions can be usedto distinguish the two distributions and thereby distinguish an NIRimage 600 of a photograph from an NIR image 700 of a human face.

FIG. 10 is a diagram of a NIR image 1000 of a photograph of a maskedhuman face and a histogram 1002 determined based on the unmasked portionof the masked NIR image 1000. An unmasked portion 1008 of the masked NIRimage 1000 is determined based on a previous NIR image of a human faceacquired with the same camera by manually determining a mask thatincludes skin from the human face and masks off other types of dataincluding facial hair, eyes, etc. Masking is a technique for determininga portion of an NIR image 1000 to be used to form a histogram 1002 thatreduces the amount of non-skin data included in the count data 1004.Masking using a predetermined unmasked portion 1008 requires cooperationfrom the subject. For example, a human seeking approval from a livenessdetermination system as discussed herein would have to ensure that theirface was positioned correctly with respect to the camera acquiring theNIR image. Positioned correctly means that the skin portion of thehuman's face must appear in the unmasked portion 1008 of the NIR image1000.

The advantage of the masking technique for acquiring a histogram 1002based on a masked NIR image 1000 of a picture of a human face is thatmost of the count data 1004 corresponding to the unmasked portion 1008of the NIR image 1000 is concentrated in a portion 1006 of the histogram1002 between the dotted lines. Concentrating the count data 1004 in thismanner and eliminating extraneous data using a mask advantageously canreduce the amount of computation required to isolate count data 1004corresponding to skin. This concentration of count data 1004 permitscomputation of the Gaussian parameters m, 6 and a without having tofirst calculate Gaussian Mixture parameters or segmenting NIR facialimages to separate skin histogram count data from other types of countdata.

FIG. 11 is a diagram of a masked NIR image 1100 of a human face and ahistogram 1102 corresponding to the unmasked portion 1108 of NIR image1100. The unmasked portion 1108 of the masked NIR image 1100 isdetermined manually as discussed above in relation to FIG. 10. As can beseen in histogram 1102, masking concentrates count data 1104corresponding to human skin into a portion 1106 of the histogram 1102between the dotted lines. As in histogram 1002 of FIG. 10, concentratingcount data 1104 advantageously permits determining the Gaussianparameters m, 6 and a without requiring calculation of Gaussian Mixtureparameters or segmenting NIR facial images to separate skin histogramcount data from other types of count data. As can be seen fromhistograms 1002 and 1102, masked NIR images 1000, 1100 readily separateGaussian distributions of count data 1004, 1104 corresponding tophotographs of skin and human skin, thereby permitting a livenessdetermination system to differentiate between a photograph of human skinand live human skin. This permits a liveness determination system toforward an NIR image corresponding to the masked NIR image 1100 thatincludes a live human face to a facial recognition system and reject theNIR image of a photograph of a human face corresponding to a masked NIRimage 1002 of a photograph of a human face.

In some examples, additional features included in a human face, such astattoos and piercings may complicate calculation of Gaussian parametersfor skin portions of a human face. In these examples, additional maskscan be determined which cover additional portions of a human face tomask off portions that include non-skin or modified skin portions suchas tattoos, piercings, etc. that can interfere with calculations ofGaussian parameters. Another technique for eliminating non-skin portionsof an NIR image of a human face recognizes that piercings, for example,are highly reflective of NIR light and therefore appear bright in an NIRimage. A filter applied to the NIR image that filters out bright regionsof the NIR image can be used to eliminate non-skin regions of a humanface corresponding to jewelry including piercings.

Another technique for masking an NIR image of a human face is togenerate uniform random patches or regions on portions of the NIR imagethat include a human face. A set of similarly-sized random patches canbe compared to a segmented image 702 of the human face such as shown inFIG. 7 to generate a score for each patch that corresponds to thepercentage of human skin included in the patch. Patches can be scored aslow, meaning no skin pixels are included in the patch, medium, meaningsome skin pixels are included in the patch, or high, meaning that mostof the pixels in the patch are skin pixels. Medium patches can befurther subdivided into sub-patches and compared to the segmented image702 to determine high subdivided patches that include mostly skin pixelsand low subdivided patches that do not include mostly skin. The patchesrated high and the subdivided patches rated high can be combined and ahistogram can be generated based on the high patches and high subdividedpatches. Using random patches in this fashion can speed the computationof Gaussian parameters corresponding to skin portions of an NIR imageand thereby improve the determination of liveness for an NIR image of ahuman face.

FIG. 12 is a diagram of a masked NIR image 1200 of a human face and ahistogram 1202 corresponding to the unmasked portion 1208 of masked NIRimage 1200. NIR image 1200 and histogram 1202 correspond to an NIR imageacquired with the human face at about a 1 meter (m) distance. FIG. 12illustrates a solution to the problem caused by differences indistributions of NIR pixel intensities in histogram 1202 caused bydifferences in distances from the camera of objects, in this examplehuman faces. Techniques discussed herein employ light from a near-pointsource wide field NIR illuminator. Most NIR illuminators that are notlaser-based or collimated with special optics are near-point source widefield NIR illuminators. For example, light emitting diode (LED) basedNIR illuminators are typically configured to provide near-point sourcewide field NIR illumination. Light from a near-point source wide fieldNIR illuminator will spread out as it is transmitted from the source tothe object to be illuminated. Because it is spreading out in twodimensions perpendicular to the direction of propagation, the intensityof the light will be subject to an inverse square law reduction inintensity per unit area. The inverse square law in this context meansthat the per unit area intensity of the NIR light will be subject to areduction in intensity proportional to the inverse of the squareddistance from the source.

When NIR illumination, subject to inverse square law reduction inintensity, illuminates an object, the light can be reflected by specularreflection or diffuse reflection. Specular reflection is reflection froma mirror or polished surface such a metal where the direction andpolarization of each light ray reflected by the surface is preserved sothat images, for example are preserved. Diffuse reflection is reflectionfrom a surface wherein each light ray is absorbed by the surface andre-emitted in a random direction at a random polarization. In diffusereflection each point on an illuminated surface in effect becomes apoint source, wide field emitter of light. One difference betweenspecular reflection and diffuse reflection is that in specularreflection, the reflected light continues to be governed by inversesquare law reduction in intensity, while diffuse reflection subjects thereflected light to a second inverse square law reduction, making the netintensity of the light subject to an inverse quadratic law reduction inintensity as a function of the distance from the source, where thereduction in intensity of the light is reduced by the inverse fourthpower of the distance from the emitter. As a result, the intensity ofpixel data in an NIR image acquired with diffusely reflected NIR lightwill be subject to an inverse quadratic reduction in intensity anddistributions of pixel intensities in histograms formed based on thepixel data will reflect this reduction in intensity based on distancesto NIR illuminated objects.

In practice, objects, including human faces, will reflect NIR light in acombination of specular and diffuse reflections. For example, highlightsor bright spots in NIR images caused by eyeglasses and metallic jewelrysuch as piercings are examples of specular reflections. Patches of shinyskin surfaces can also include a higher percentage of specularreflections, hence their apparent brightness in comparison tosurrounding skin. Although human skin, for example, can reflect lightsubject to a combination of inverse square law reduction and inversequadratic law reduction, in general the intensity of reflected lightwill decrease with increasing round-trip distance between the NIRilluminator and the NIR camera. Techniques discussed herein can improveliveness determination by compensating for distance variation betweenobjects that preserves reflectance data to permit liveness determinationbased on the reflectance data. Traditional techniques, such as histogramequalization, increase image contrast in a fashion that alters thehistogram data to prevent liveness determination.

In addition to the square law reduction in intensity, the design of thelens included in the camera acquiring the RGB/NIR data can be taken intoaccount when performing relative distance estimation for objects in thefield of view of a camera. For example, depending upon the f-number ofthe lens, distortion will be introduced into an image of an object basedon the distance of the object from the lens. For example, a wide-anglelens (low f-number lens) will introduce distortion making a person'snose look comically large as the person's face approaches the cameralens. A wide-angle lens can expand objects near the lens and compressobjects located far away depending upon the location of the object withrespect to the optical axis. Other lenses, for example telecentric orrectilinear lenses, do not distort objects based on distance. Techniquesdisclosed herein can compensate for lens distortion by permittingparameters corresponding to the lens size, magnification and f-number tobe input to the authentication system for each camera to be used. Thelens parameters can be used to determine a homography matrix which canbe used to perform an affine transformation on an image and compensatefor distortion introduced by the lens. An affine transformation cancompensate for distortion by changing pixel locations in an image byperforming translations, rotations and scale changes in x and y forpixels in an image plane.

Techniques described herein perform a relative distance estimation basedon measurements of pixel intensities from NIR image regions that includeskin portions of a human face. These techniques are successful as longas the subject in the NIR image cooperates in making sure that skinportions of the subject's face are within unmasked regions of the NIRimage. Histogram 1202 illustrates Gaussian distributions 1204, 1206 fitto raw count data from NIR image 1200 mask portion 1208. NIR image 1200was acquired with the subject approximately 1 m (meter) from the camera.Gaussian distribution 1204 corresponds to background portions of NIRimage 1200 and Gaussian distribution 1206 corresponds to NIR lightreflected from the subject's facial skin.

FIG. 13 is a diagram of a masked NIR image 1300 of a human face and ahistogram 1302 corresponding to the unmasked portion 1308 of masked NIRimage 1300. Masked NIR image 1300 and histogram 1302 correspond to anNIR image acquired with the human face at about a 0.5 meter (m)distance. Histogram 1302 illustrates Gaussian distributions 1304, 1306fit to raw count data from NIR image 1300 mask portion 1308. Gaussiandistribution 1304 corresponds to background portions of NIR image 1300and Gaussian distribution 1306 corresponds to NIR light reflected fromthe subject's facial skin.

To estimate the relative distance between subjects in NIR images 1200,1300, parameters for a Gaussian distribution of pixels on target arecalculated at enrollment. Pixels on target are defined as pixels thatfall within the unmasked portion 1208, 1308 of the NIR image 1200, 1300.When a subject's NIR image is presented for authentication or challenge,an inverse quadratic relationship between the Gaussian distributionmeans can be approximated by the equation:

$\begin{matrix}{{{RD} = \left( \frac{{enrolled}\mspace{14mu}{pixels}}{{challenge}\mspace{14mu}{pixels}} \right)^{\frac{1}{2}}},} & (5)\end{matrix}$

where the enrolled pixels can correspond to the actual histogram pixelintensity values or the Gaussian distribution mean and the challengepixels can likewise correspond to the actual histogram pixel intensityvalues or the Gaussian distribution mean. Because the distance from thesensor of the subject at enrollment might not be known, RD is a relativemeasure that measures the distance from the sensor of the subject in thechallenge NIR image relative to the distance from the sensor of thesubject in the enrollment NIR image. The raw count data or the Gaussiandistributions based on the raw histogram data in histograms 1202, 1302can be scaled based on the calculated relative distance according to theequation:

$\begin{matrix}{{{LS} = {\beta \times \left( \left( \frac{{enrolled}\mspace{14mu}{pixels}}{{challenge}\mspace{14mu}{pixels}} \right)^{\frac{1}{2}} \right)^{2}}},} & (6)\end{matrix}$

where LS is the liveness scale factor used to multiply the raw countdata in histograms and β is a scale factor that can be determinedempirically by experimentation with an example NIR illuminator, andexample NIR camera and a plurality NIR images of subjects at a pluralityof distances. If the relative distance RD is not required for othercalculations, equation (4) simplifies to:

$\begin{matrix}{S = {\beta \times {\left( \frac{{enrolled}\mspace{14mu}{pixels}}{{challenge}\mspace{14mu}{pixels}} \right).}}} & (7)\end{matrix}$

Techniques discussed in relation to FIGS. 12 and 13 scale pixelintensities in a histogram based on ratios of Gaussian parametersdetermined by measuring pixel intensities in acquired NIR images.Because the ratios of Gaussian parameters are determined based onacquired NIR images, an attacker may try to spoof a livenessauthentication system as discussed herein by presenting an extremelylarge fake object at a far distance. This would decrease the subjectreflectance to that of a live object while appearing as if it were stillclose to the camera. To mitigate this, secondary anti-spoofing methodscan be utilized. Conventional techniques such as requiring eye glint,blink rate detection, natural motion detection, etc. could beincorporated. These techniques could make it significantly harder tospoof at a distance (e.g. it is very hard to print a poster sized faceand fake the eye blink behaviors without extensive cooperation of thesubject).

In addition, if passive distance measuring techniques are available tomeasure distances of objects to the sensor acquiring the NIR imagedistance measures so determined could be used in addition to distanceestimation techniques discussed herein. Examples of passive distancemeasuring technologies include distance estimation using lightpolarization, lidar, and ultrasound. For example, lidar can determine adistance from a sensor to an object in a scene by measuring timerequired for a pulse of light to travel from a sensor to an object andback. Polarization techniques can measure a difference in reflectedlight polarization between a background and an object in an NIR image.Ultrasound sensors can measure time required for a pulse of ultrasoundenergy to travel from a transducer to an object and back. A distancevalue determined by light polarization, lidar or ultrasound can beaveraged with an estimated distance value determined by techniquesdiscussed herein to generate an estimated relative distance measure.

All of the techniques discussed herein regarding the classification ofNIR image data can be subject to reinforcement learning. Reinforcementlearning is performed by keeping statistics regarding the number ofcorrect and incorrect results achieved by a liveness authenticationsystem in use and using the statistical results to re-train the livenessauthentication system. For example, assume a liveness authenticationsystem is used to unlock a vehicle when approached by a valid user. Avalid user is a user with prearranged permission to use a vehicle. In anexample where the liveness authentication system fails to correctlyauthenticate a valid user and unlock the vehicle, the user can be forcedto unlock the vehicle manually with a key or fob, or use a 2-factorauthorization system such as entering a code sent to a cell phonenumber. When a user is forced to unlock the vehicle manually, theauthentication system can store data regarding the incorrectauthentication including the NIR image of the user that was incorrectlyauthenticated.

Determining what to do with data regarding the incorrect authenticationcan be based on a reward system. A reward system retrains the trainedmodel corresponding to the authentication system depending upon theoutcome of the failure to authenticate. If the potential user fails togain access to the vehicle, it is assumed that the failed attempt was anattempted spoof, and the data is appended to a training dataset oflikely spoof data. If the potential user gains access using one of themanual approaches, for example keys, fobs, or 2-factor authorization,the data is appended to a training dataset of false negatives to becorrected in the training process. The authentication system can beretrained based on the updated training dataset periodically or when thenumber of new images added to the training dataset exceeds auser-determined threshold. Retraining can be applied to bothdeterministic authentication systems based on Gaussian parameters anddeep neural network-based systems.

Data regarding failure to authenticate a potential user can be federatedor shared among a plurality of vehicles. The data regarding failure toauthenticate can be uploaded to a cloud-based server that includes acentral repository of training datasets. The uploaded NIR images andcorresponding outcomes can be aggregated in updated training datasetsand results of retraining based on the new data can be compared toresults for the previous training. If the new training dataset improvesperformance, the new trained model can be pushed or downloaded tovehicles using the authentication system. Note that no personal dataregarding users' identities needs to be uploaded to the cloud-basedservers, only NIR images and outcomes. By federating new trained modelsbased on training data uploaded from a plurality of vehicles,performance of an authentication system can be continuously improvedover the lifetime of the system.

In addition, techniques described herein can be applied to articleidentification tasks which require that articles be authenticated todetermine whether an article is real or counterfeit, for example. Anyobject having repeatable form and surface appearance can beauthenticated using the techniques described herein. For example, avehicle part can be authenticated to determine the presence of realleather or fake leather as part of an incoming part inspection process.Ambient light determination, image segmentation, and relative distanceestimation as described herein can be applied to material spectroscopictechniques based on Gaussian distributions of processing using deepneural networks as described herein to authenticate articles.

FIG. 14 is a diagram of a flowchart, described in relation to FIGS.1-13, of a process for authenticating subject liveness from an NIRimage. Process 1400 can be implemented by a processor of a computingdevice such as a computing device 110, taking as input information fromsensors, and executing commands, and outputting object information, forexample. Process 1400 includes multiple blocks that can be executed inthe illustrated order. Process 1400 could alternatively or additionallyinclude fewer blocks or can include the blocks executed in differentorders.

Process 1400 begins at block 1402, where a computing device acquires afirst NIR image of a subject, for example a human face. This correspondsto challenge image, where a first NIR image of a subject is acquired andprocessed to provide data to be used to test against and enrollment NIRimage acquired at a previous time.

At block 1404 the acquired first NIR image is segmented to determineportions of the first NIR image that correspond to a first surface. Inan example of techniques described herein, portions of the first NIRimage that correspond to human skin are segmented to separate them fromportions of the first NIR image that correspond to background, hair,clothing, etc. as described above in relation to FIG. 7.

At block 1406 a first measure of pixel count values corresponding to thesegmented portions corresponding to human skin in the first NIR image ismade. The first measure of pixel count data corresponds to a firsthistogram of pixel count data from the first NIR image. The firsthistogram is analyzed to determine Gaussian distribution parameters forpixels corresponding to human skin in the first NIR image as discussedabove in relation to FIG. 2, above. A Gaussian mixture model can be usedto separate the Gaussian distribution corresponding to human skin pixelsfrom Gaussian distributions corresponding to non-skin surfaces asdiscussed above in relation to FIGS. 5 and 6.

At block 1408 the Gaussian distribution parameters corresponding to thepixel count data from the segmented portions of the first NIR image arecompared to a second measure of pixel values including Gaussiandistribution parameters corresponding to pixel count data from aprevious histogram. The previous histogram is a measure of pixel valuesdetermined based on a second NIR image (enrollment NIR image) of a humansubject as discussed above in relation to FIG. 2. The second NIR imagewas also segmented so that the second measure of pixel values includingGaussian distribution parameters correspond to human skin pixels fromthe second NIR image.

At block 1410 the first NIR image is tested to determine whether thefirst NIR image includes a live human subject by comparing the Gaussiandistribution parameters from segmented portions of the first NIR imageto stored Gaussian distribution parameters from a previously acquiredsecond NIR image. If the Gaussian distribution parameters from thesegmented portions of the first NIR image are equal to the Gaussiandistribution parameters from the second NIR image, within empiricallydetermined tolerances, the first NIR image is liveness authenticated. Ifthe Gaussian distribution parameters from the first NIR image are notequal to the Gaussian distribution parameters from the second NIR image,within empirically determined tolerances, the first NIR image is notauthenticated.

At block 1412, the first NIR image has been authenticated for livenessand is output to a facial recognition software program for furtherprocessing to determine the identity of the subject in the first NIRimage. Following block 1412 the process 1400 ends.

At block 1414, the first NIR image has not been authenticated forliveness and is not output to a facial recognition software program forfurther processing. Following block 1414 the process 1400 ends.

Computing devices such as those discussed herein generally each includescommands executable by one or more computing devices such as thoseidentified above, and for carrying out blocks or steps of processesdescribed above. For example, process blocks discussed above may beembodied as computer-executable commands.

Computer-executable commands may be compiled or interpreted fromcomputer programs created using a variety of programming languagesand/or technologies, including, without limitation, and either alone orin combination, Java™, C, C++, Python, Julia, SCALA, Visual Basic, JavaScript, Perl, HTML, etc. In general, a processor (e.g., amicroprocessor) receives commands, e.g., from a memory, acomputer-readable medium, etc., and executes these commands, therebyperforming one or more processes, including one or more of the processesdescribed herein. Such commands and other data may be stored in filesand transmitted using a variety of computer-readable media. A file in acomputing device is generally a collection of data stored on a computerreadable medium, such as a storage medium, a random-access memory, etc.

A computer-readable medium includes any medium that participates inproviding data (e.g., commands), which may be read by a computer. Such amedium may take many forms, including, but not limited to, non-volatilemedia, volatile media, etc. Non-volatile media include, for example,optical or magnetic disks and other persistent memory. Volatile mediainclude dynamic random-access memory (DRAM), which typically constitutesa main memory. Common forms of computer-readable media include, forexample, a floppy disk, a flexible disk, hard disk, magnetic tape, anyother magnetic medium, a CD-ROM, DVD, any other optical medium, punchcards, paper tape, any other physical medium with patterns of holes, aRAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip orcartridge, or any other medium from which a computer can read.

All terms used in the claims are intended to be given their plain andordinary meanings as understood by those skilled in the art unless anexplicit indication to the contrary in made herein. In particular, useof the singular articles such as “a,” “the,” “said,” etc. should be readto recite one or more of the indicated elements unless a claim recitesan explicit limitation to the contrary.

The term “exemplary” is used herein in the sense of signifying anexample, e.g., a reference to an “exemplary widget” should be read assimply referring to an example of a widget.

The adverb “approximately” modifying a value or result means that ashape, structure, measurement, value, determination, calculation, etc.may deviate from an exactly described geometry, distance, measurement,value, determination, calculation, etc., because of imperfections inmaterials, machining, manufacturing, sensor measurements, computations,processing time, communications time, etc.

In the drawings, the same reference numbers indicate the same elements.Further, some or all of these elements could be changed. With regard tothe media, processes, systems, methods, etc. described herein, it shouldbe understood that, although the steps or blocks of such processes, etc.have been described as occurring according to a certain orderedsequence, such processes could be practiced with the described stepsperformed in an order other than the order described herein. It furthershould be understood that certain steps could be performedsimultaneously, that other steps could be added, or that certain stepsdescribed herein could be omitted. In other words, the descriptions ofprocesses herein are provided for the purpose of illustrating certainembodiments, and should in no way be construed so as to limit theclaimed invention.

1. A computer, comprising: a processor; and a memory, the memoryincluding instructions executable by the processor to: acquire a firstimage by illuminating a first object with a first light beam; segmentthe first image of the first object to determine regions that correspondto a first surface material; determine a first measure of pixel valuesin regions of the first image that correspond to the first surfacematerial; perform a comparison of the first measure of pixel values to asecond measure of pixel values determined from a second image of asecond object, wherein the second image is previously acquired byilluminating the second object with a second light beam; and when thecomparison determines that the first measure is equal to the secondmeasure of pixel values within a tolerance, determine that the firstobject and the second object are a same object.
 2. The computer of claim1, wherein the first light beam is a near infrared light beam and thesecond light beam is a near infrared light beam.
 3. The computer ofclaim 2, wherein the first and second images are acquired with a camerathat acquires near infrared pixels, red pixels, green pixels and bluepixels.
 4. The computer of claim 1, the instructions including furtherinstructions to segment the first image into first surface material andnon-first surface material regions by processing the first image with aconvolutional neural network.
 5. The computer of claim 4, theinstructions including further instructions to train the neural networkto segment the first image into first surface material and non-firstsurface material regions using ground truth images segmented into firstsurface material and non-first surface material regions by humanoperators.
 6. The computer of claim 1, the instructions includingfurther instructions to segment the first image into first surfacematerial and non-first surface material regions by applying a mask tothe first image based on determining locations of facial featuresincluding eyes and a mouth.
 7. The computer of claim 1, the instructionsincluding further instructions to segment the first image into firstsurface material and non-first surface material regions by determiningsimilarly-sized random patches in the first image and processing therandom patches with a second convolutional neural network to determinepatches that include non-skin pixels, patches that include skin pixelsand patches that include both skin and non-skin pixels.
 8. The computerof claim 7, wherein the patches that include first surface material andnon-first surface material are divided into sub-patches and reprocessedto determine skin and non-skin sub-patches.
 9. The computer of claim 8,wherein a histogram of skin regions is compared to a previously acquiredhistogram by applying a Gaussian mixture model to the histograms todetermine Gaussian distributions to compare.
 10. The computer of claim1, wherein the first and second measures of pixel values are first andsecond mean values calculated on first and second histograms of pixelvalues included in first and second images, respectively.
 11. Thecomputer of claim 10, wherein the first and second mean values arecalculated based on a Gaussian mixture model applied to the first andsecond histograms, respectively.
 12. The computer of claim 1, theinstructions including further instructions to output the determinationas to whether the first object and the second object are a same type ofobject.
 13. The computer of claim 1, instructions including furtherinstructions to, when the comparison determines that each of the firstobject and the second object is a human face, perform humanidentification testing.
 14. The computer of claim 13, the instructionsincluding further instructions to, when the comparison determines thateach of the first object and the second object is a human face and arethe same object, operate a vehicle.
 15. A method comprising: acquiring afirst image by illuminating a first object with a first light beam;segmenting the first image of the first object to determine regions thatcorrespond to a first surface material; determining a first measure ofpixel values in regions of the first image that correspond to the firstsurface material; performing a comparison of the first measure of pixelvalues to a second measure of pixel values determined from a secondimage of a second object, wherein the second image is previouslyacquired by illuminating the second object with a second NIR light beam;and when the comparison determines that the first measure is equal tothe second measure of pixel values within a tolerance, determine thatthe first object and the second object are a same object.
 16. The methodof claim 15, wherein the first light beam is a near infrared light beamand the second light beam is a near infrared light beam.
 17. The methodof claim 16, wherein the first image and the second image are acquiredwith a camera that acquires near infrared pixels, red pixels, greenpixels and blue pixels.
 18. The method of claim 15, the instructionsincluding further instructions to segment the first image into firstsurface material and non-first surface material regions by processingthe first image with a convolutional neural network.
 19. The method ofclaim 18, the instructions including further instructions to train theneural network to segment the first image into first surface materialand non-first surface material regions using ground truth imagessegmented into first surface material and non-first surface materialregions by human operators.
 20. The method of claim 15, the instructionsincluding further instructions to segment the first image into firstsurface material and non-first surface material regions by applying amask to the first image based on determining locations of facialfeatures including eyes and a mouth.